Import packages used in this notebook

1. 1000G ethnicities

a) Uploading the ethnicity file

b), c), d) Plotting the 3rd, 4th and 5th columns as a barplot to summarize the number of samples per category

2. Reading the VCF file

Check with the %%bash magic function header of the file:

3. Merge both ethnic and VCF datasets

a) Get samples with missing ethnicity annotation

b) Merge the ethnicities dataframe with the one with samples having missing annotations

c) Generate a pie chart of the final ethnicities, including the 'Unknown' samples